Kish_HP2: Employee Salaries Analysis

Author

John Kish

Import and Setup

Setup
# loading libraries
library(tidyverse)
library(here)
library(scales)
library(gt)
Import
# locating
here::i_am("analysis/Kish_HP2.qmd")
library(here)

# reading in data
# df was used instead of a longer name for the sake of my sanity in typing
df <- read.csv(here("data/Employee_Salaries.csv"))
Explanation of Data Variables
# Department
# Abbreviation/acronym for the department name. Example: ABS for the Alcohol Beverage Services department

# Department_Name
# the department name for all active, permanent employees of Montgomery County, MD (Maryland)


# Division
# The division for each department. That is, a single department can have multiple divisions and focuses.
# Example: ABS has the divisions that include those of ABS 85 Administrative Services
# and ABS 85 Beer Delivery Operations

# Gender
# the Gender of M and F. Assuming that since this is filings with the government, M is to be the legal sex of the individual M for Male and F for Female.

# Base_Salary
# precise base salaries (including down to the cent) for the individual, at the end of the calendar year.
# this is projected data. Should be accurate, but projection allows for greater flexibiliity.


# Overtime_Pay
# amount of overtime pay an employee/individual received for working beyond what they were required to do


# Longevity_Pay
# Longevity pay gives additional compensation for qualifying employees who have worked for a good deal of time. The longer time that is worked, the greater the longevity pay. Specifics about how long are dealt with in-house.


# Grade
# the standing an employee has within a company. The internal codes for all positions are not provided, but we can safely look at M1, M2, M3, etc. as those are clearly levels of management. There are thousands of grades.

# M4-Senior Manager (Not included in this particular data)
# M3-Middle Manager
# M2-Middle Manager
# M1-First Level Manager

Case Study: Employees of Montgomery County, MD paid in calendar year 2023

Workplace inequalities are a concern to many. We’ll be examining that topic — with the example of reliable and public data from Montgomery County, MD.

The employees in question are all active, permanent employees of Montgomery County, MD (Maryland).

Each employee works in a department, where each department can have multiple divisions.

In addition to one’s guaranteed base pay, an employee could earn overtime pay, longevity pay (when eligible for working at the company for a long time), both (additional compensation), either overtime or longevity pay, or neither.

These forms of pay (specifically, base pay and additional pay) sum to an employee’s total compensation.

Additional detail can be found above in “Explanation of Data Variables” tab.


We will begin by exploring the relationship between base pay and total compensation across all employees.

Then, we will look to see how this relationship develops across gender, management, and department.

Preliminary

First, let’s establish the data that we’ll be looking at

# Computing Total and Additional Compensation
df$Total_Compensation <- df$Base_Salary + df$Overtime_Pay + df$Longevity_Pay
df$Additional_Compensation <- df$Overtime_Pay + df$Longevity_Pay

# Reading Gender as Sex
df <- rename(df, Sex = "Gender")

# str(df)
# checking the datatypes for each variable
# df <- df |> mutate(across(where(is.double), as.integer))
# Double deemed more useful, can change to integer if further analysis needs integer type

# detecting NA Values
sum(is.na(df))

Additional compensation will be the combination of overtime and longevity pay.

Total compensation will be the combination of base pay and additional compensation.

In the data, there are 0 missing values. (Good news!)

Assuming that since this data comes from the public and not the private sphere (and so the data would be collected based on public filings), M is assumed to be the legal sex of the individual — M for Male and F for Female. That is, sex as legality rather than gender as identity seems to be what is implied. Occasionally, the term “gender” might be used here as well to refer to “sex”.

Topic 1 Total Compensation and Overtime

Preliminary Analysis

How many employees are there? How many of them get overtime, longevity, either, or both?

Preliminary Analysis
#each row is a unique employee, for each dataset


# Total Count & cresting Employee number column
Total_Employees <- nrow(df)
df <- mutate(df, Employee = row_number())

# Base Count & Percentage
Employees_Without_Base <- filter(df, Base_Salary == 0.00) 
Employees_With_Base <- anti_join(df, Employees_Without_Base)
Percent_With_Base <- (nrow(Employees_With_Base) / Total_Employees) * 100



# Over Count & Percentage
Employees_Without_Overtime <- filter(df, Overtime_Pay == 0.00)
Employees_With_Overtime <- anti_join(df, Employees_Without_Overtime)
Percent_Overtime <- (nrow(Employees_With_Overtime) / Total_Employees) * 100


# Long Count & Percentage
Employees_Without_Longevity <- filter(df, Longevity_Pay == 0.00)
Employees_With_Longevity <- anti_join(df, Employees_Without_Longevity)
Percent_Longevity <- (nrow(Employees_With_Longevity) / Total_Employees) * 100


# Either Over or Long Count % Percentage
Employees_Without_Over_Long <- filter(df, Overtime_Pay == 0.00 | Longevity_Pay == 0.00)
Employees_With_Over_Long <- anti_join(df, Employees_Without_Over_Long)
Percent_Over_Long <- (nrow(Employees_With_Over_Long) / Total_Employees) * 100

# Add Comp Count % Percentage
Employees_Without_Add_Comp <- filter(df, Additional_Compensation == 0.00)
Employees_With_Add_Comp <- anti_join(df, Employees_Without_Add_Comp)
Percent_Add_Comp <- (nrow(Employees_With_Add_Comp) / Total_Employees) * 100

There are 10291 employees total, where 0 employees have no base salary (100% have a base salary)

Approximately 16.3% have overtime pay, 27.6% have longevity pay, and 16.3% have one or the either, and 66.1% have both.

Diminishing Returns?

Plot Coefficients

Is overtime pay worth it? Does it have diminishing returns? If overtime pay (and other additional compensation) had diminishing returns, we would expect its slope to be less than 1. A slope of greater than 1 indicates that you get more and more total compensation for every bit more overtime and other additional pay you do. A higher slope indicates a greater degree of this scaling — which is desireable!

Preliminary Analysis Coefficients
Base_Total_lm <- lm(Total_Compensation ~ Base_Salary, 
           data = filter(df))

Over_Base_Total_lm <- lm(Total_Compensation ~ Base_Salary, 
           data = filter(Employees_With_Overtime))

Long_Base_Total_lm <- lm(Total_Compensation ~ Base_Salary, 
           data = filter(Employees_With_Longevity))

Over_Long_Base_Total_lm <- lm(Total_Compensation ~ Base_Salary, 
           data = filter(Employees_With_Over_Long))

Add_Comp_Base_Total_lm <- lm(Total_Compensation ~ Base_Salary, 
           data = filter(Employees_With_Add_Comp))

Based on the coefficients for the graphs, those who only get base pay have their total compensation scale up the least with those having overtime and longevity pay scaling up the most.

In particular, as is seen in the plots too, overtime pay scales slightly larger than longevity pay.

So, there does seem to be a non-zero difference between the slope of taking on overtime (and getting more pay) + total compensation and the slope of 1 (overtime pay is directly correlational with total compensation)

Plots

Preliminary Analysis Plots
#overall
p <- df |>
ggplot(mapping = aes(x = Base_Salary, y = Total_Compensation)) +
  geom_point(aes(color = "red")) +
  scale_x_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
  scale_y_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
    labs(x = "Base Salary", y = "Total Compensation", title = "Base Salary vs. Total Compensatory for All Employees") +
  geom_smooth(method = "lm")
p

The slope coefficient (intercept and then slope) is 3269.0284279, 1.070269.

Respective Percentage Total Compensation
p2 <- Employees_With_Overtime |>
ggplot(mapping = aes(x = Base_Salary, y = Total_Compensation)) +
  geom_point(aes(color = "red")) +
  scale_x_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
  scale_y_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
    labs(x = "Base Salary", y = "Total Compensation", title = "Base Salary vs. Total Compensatory for All Those with Overtime Pay") +
  geom_smooth(method = "lm")
p2

The slope coefficient (intercept and then slope) is -1.1910607^{4}, 1.3303052.

Respective Percentage Total Compensation
p3 <- Employees_With_Longevity |>
ggplot(mapping = aes(x = Base_Salary, y = Total_Compensation)) +
  geom_point(aes(color = "red")) +
  scale_x_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
  scale_y_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
    labs(x = "Base Salary", y = "Total Compensation", title = "Base Salary vs. Total Compensatory for All Those with Longevity Pay") +
  geom_smooth(method = "lm")
p3

The slope coefficient (intercept and then slope) is -1.1115563^{4}, 1.2845143.

Respective Percentage Total Compensation
p4 <- Employees_With_Over_Long |>
ggplot(mapping = aes(x = Base_Salary, y = Total_Compensation)) +
  geom_point(aes(color = "red")) +
  scale_x_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
  scale_y_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
    labs(x = "Base Salary", y = "Total Compensation", title = "Base Salary vs. Total Compensatory for All Those with Overtime or longevity pay") +
  geom_smooth(method = "lm")
p4

The slope coefficient (intercept and then slope) is -3.2010085^{4}, 1.5664147.

Respective Percentage Total Compensation
p5 <- Employees_With_Add_Comp |>
ggplot(mapping = aes(x = Base_Salary, y = Total_Compensation)) +
  geom_point(aes(color = "red")) +
  scale_x_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
  scale_y_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
    labs(x = "Base Salary", y = "Total Compensation", title = "Base Salary vs. Total Compensatory for All those with additional compensation") +
  geom_smooth(method = "lm")
p5

The slope coefficient (intercept and then slope) is -4803.6504064, 1.2194241.

Diminishing Returns? Revisted: Percentages

How much of their total compensation is base + overtime, longevity, either, or both (In terms of percentage?)

Respective Percentage Total Compensation
# addiing to the original df a new column, using mutate, that is the percent of total
# compensation that is each of the relevant factors.

df <- group_by(df, Employee) |>
  mutate(Percent_Over = Overtime_Pay/Total_Compensation)
df$Percent_Over <- round(df$Percent_Over, 3) * 100

df <- group_by(df, Employee) |>
  mutate(Percent_Long = Longevity_Pay/Total_Compensation)
df$Percent_Long <- round(df$Percent_Long, 3) * 100

df <- group_by(df, Employee) |>
  mutate(Percent_Base = Base_Salary/Total_Compensation)
df$Percent_Base <- round(df$Percent_Base, 3) * 100

df <- group_by(df, Employee) |>
  mutate(Percent_Add_Comp = Additional_Compensation/Total_Compensation)
df$Percent_Add_Comp <- round(df$Percent_Add_Comp, 3) * 100

df <- group_by(df, Employee) |>
  mutate(Percent_Over_Long = Base_Salary/Total_Compensation)
df$Percent_Over_Long <- round(df$Percent_Over_Long, 3) * 100

Coefficients of the percentage things

Preliminary Analysis Plots
Percent_Base_Total_lm <- lm(Total_Compensation ~ Percent_Base, 
           data = filter(df))

Percent_Over_Total_lm <- lm(Total_Compensation ~ Percent_Over, 
           data = filter(df))

Percent_Long_Total_lm <- lm(Total_Compensation ~ Percent_Long, 
           data = filter(df))

Percent_Over_Long_Total_lm <- lm(Total_Compensation ~ Percent_Over_Long, 
           data = filter(df))

Percent_Add_Comp_Total_lm <- lm(Total_Compensation ~ Percent_Add_Comp, 
           data = filter(df))

In terms of these slope coefficients, longevity pay has the greatest scaling. Interestingly, the more percentage base pay you have, the less your total compensation! (negative slope!)

Plots

Preliminary Analysis Plots
p6 <- df |>
  group_by(Employee) |>
ggplot(mapping = aes(x = Percent_Base, y = Total_Compensation)) +
  geom_point(aes(color = "red")) +
  scale_y_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
    labs(x = "Percent Base", y = "Total Compensation", title = "Base Salary vs. Total Compensatory for All those with additional compensation") +
  geom_smooth(method = "lm")
p6

The slope coefficient (intercept and then slope) is 2.4535346^{5}, -1576.4445085.

Preliminary Analysis Plots
p10 <- df |>
  group_by(Employee) |>
ggplot(mapping = aes(x = Percent_Over, y = Total_Compensation)) +
  geom_point(aes(color = "red")) +
  scale_y_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
    labs(x = "Percent Add Comp", y = "Total Compensation", title = "Base Salary vs. Total Compensatory for All those with additional compensation") +
  geom_smooth(method = "lm") +
  facet_wrap(~Sex)
p10

The slope coefficient (intercept and then slope) is 9.0650397^{4}, 1428.4602409.

Preliminary Analysis Plots
p7 <- df |>
  group_by(Employee) |>
ggplot(mapping = aes(x = Percent_Long, y = Total_Compensation)) +
  geom_point(aes(color = "red")) +
  scale_y_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
    labs(x = "Percent Long", y = "Total Compensation", title = "Base Salary vs. Total Compensatory for All those with additional compensation") +
  geom_smooth(method = "lm")
p7

The slope coefficient (intercept and then slope) is 9.406297^{4}, 4668.7561887.

Preliminary Analysis Plots
p8 <- df |>
  group_by(Employee) |>
ggplot(mapping = aes(x = Percent_Over_Long, y = Total_Compensation)) +
  geom_point(aes(color = "red")) +
  scale_y_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
    labs(x = "Percent Over Long", y = "Total Compensation", title = "Base Salary vs. Total Compensatory for All those with additional compensation") +
  geom_smooth(method = "lm")
p8

The slope coefficient (intercept and then slope) is 2.4535346^{5}, -1576.4445085.

Preliminary Analysis Plots
p9 <- df |>
  group_by(Employee) |>
ggplot(mapping = aes(x = Percent_Add_Comp, y = Total_Compensation)) +
  geom_point(aes(color = "red")) +
  scale_y_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
    labs(x = "Percent Add Comp", y = "Total Compensation", title = "Base Salary vs. Total Compensatory for All those with additional compensation") +
  geom_smooth(method = "lm") +
  facet_wrap(~Sex)
p9

The slope coefficient (intercept and then slope) is 8.7709012^{4}, 1576.4445085.

Again, in terms of these slope coefficients, longevity pay has the greatest scaling. Perhaps there are other things going on in that are affecting this (higher skill = higher pay?)

Topic 2 Gender

Gender is one of the most cited inequalities anywhere, let alone the workplace.

We will be going through our previous analysis, but this time separate Male and Female to see if their scalings differ.

The sex with the higher (positive) scaling benefits more from overtime, longevity pay, etc.

This is a simplified analysis that does not control for how much (if at all) each sex accepts these additional forms of compensation (e.g. overtime) — it is at a higher level. That is, it is looking at the resulting inequalities between males and females regardless of how those inequalities came to be.

Source of Income Coefficients, by Gender

Data Coefficients, by Gender
# Base effect on total compensation, by sex
M_Base_Total_lm <- lm(Total_Compensation ~ Base_Salary, 
           data = filter(df, Sex =="M"))
F_Base_Total_lm <- lm(Total_Compensation ~ Base_Salary, 
           data = filter(df, Sex == "F"))

# Over
M_Over_Base_Total_lm <- lm(Total_Compensation ~ Base_Salary, 
           data = filter(Employees_With_Overtime, Sex =="M"))
F_Over_Base_Total_lm <- lm(Total_Compensation ~ Base_Salary, 
           data = filter(Employees_With_Overtime, Sex == "F"))

# Long
M_Long_Base_Total_lm <- lm(Total_Compensation ~ Base_Salary, 
           data = filter(Employees_With_Longevity, Sex =="M"))
F_Long_Base_Total_lm <- lm(Total_Compensation ~ Base_Salary, 
           data = filter(Employees_With_Longevity, Sex == "F"))

# Over Long
M_Over_Long_Base_Total_lm <- lm(Total_Compensation ~ Base_Salary, 
           data = filter(Employees_With_Over_Long, Sex =="M"))
F_Over_Long_Base_Total_lm <- lm(Total_Compensation ~ Base_Salary, 
           data = filter(Employees_With_Over_Long, Sex == "F"))



#Add Comp
M_Add_Comp_Base_Total_lm <- lm(Total_Compensation ~ Base_Salary, 
           data = filter(Employees_With_Add_Comp, Sex =="M"))
F_Add_Comp_Base_Total_lm <- lm(Total_Compensation ~ Base_Salary, 
           data = filter(Employees_With_Add_Comp, Sex == "F"))

Percentage Source of Income Coefficients, by Gender

NOTE: ignored in analysis, deemed insignificant to the question of gender

Data Percent Coefficients, by Gender
# Percent Base
Gender_Percentage_M_lm <- lm(Total_Compensation ~ Percent_Base, 
           data = filter(df,Sex =="M"))

Gender_Percentage_F_lm <- lm(Total_Compensation ~ Percent_Base, 
           data = filter(df, Sex =="F"))


# Percent Over
Gender_Percentage_Over_M_lm <- lm(Total_Compensation ~ Percent_Over, 
           data = filter(df, Sex =="M"))

Gender_Percentage_Over_F_lm <- lm(Total_Compensation ~ Percent_Over, 
           data = filter(df, Sex =="F"))



# Percent Long
Gender_Percentage_Long_M_lm <- lm(Total_Compensation ~ Percent_Long, 
           data = filter(df, Sex =="M"))

Gender_Percentage_Long_F_lm <- lm(Total_Compensation ~ Percent_Long, 
           data = filter(df, Sex =="F"))


# Percent Over Long
Gender_Percentage_Over_Long_M_lm <- lm(Total_Compensation ~ Percent_Over_Long, 
           data = filter(df, Sex =="M"))

Gender_Percentage_Over_Long_F_lm <- lm(Total_Compensation ~ Percent_Over_Long, 
           data = filter(df, Sex =="F"))


# Percent Add Comp
Gender_Percentage_Add_Comp_M_lm <- lm(Total_Compensation ~ Percent_Add_Comp, 
           data = filter(df, Sex =="M"))

Gender_Percentage_Add_Comp_F_lm <- lm(Total_Compensation ~ Percent_Add_Comp, 
           data = filter(df, Sex =="F"))

Gender Analysis Plots

Source of Income, by Gender

Gender Plot 1
#overall
p1 <- df |>
ggplot(mapping = aes(x = Base_Salary, y = Total_Compensation)) +
  geom_point(aes(color = "red")) +
  scale_x_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
  scale_y_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
    labs(x = "Base Salary", y = "Total Compensation", title = "Base Salary vs. Total Compensatory for All Employees") +
  geom_smooth(method = "lm") +
  facet_wrap(~Sex)
p1

The slope coefficient (intercept and then slope) for males is 5255.617831, 1.0914173 and for females it is 2325.7268239, 1.0198384.

Gender Plot 2
p2 <- Employees_With_Overtime |>
ggplot(mapping = aes(x = Base_Salary, y = Total_Compensation)) +
  geom_point(aes(color = "red")) +
  scale_x_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
  scale_y_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
    labs(x = "Base Salary", y = "Total Compensation", title = "Base Salary vs. Total Compensatory for All Those with Overtime Pay") +
  geom_smooth(method = "lm") +
  facet_wrap(~Sex)
p2

The slope coefficient (intercept and then slope) for males is -1.2171035^{4}, 1.3550508 and for females it is-5720.6699864, 1.1905479.

Gender Plot 3
p3 <- Employees_With_Longevity |>
ggplot(mapping = aes(x = Base_Salary, y = Total_Compensation)) +
  geom_point(aes(color = "red")) +
  scale_x_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
  scale_y_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
    labs(x = "Base Salary", y = "Total Compensation", title = "Base Salary vs. Total Compensatory for All Those with Longevity Pay") +
  geom_smooth(method = "lm") +
  facet_wrap(~Sex)
p3

The slope coefficient (intercept and then slope) for males is -1.0675483^{4}, 1.3218743 and for females it is -2616.177079, 1.1207052.

Gender Plot 4
p4 <- Employees_With_Over_Long |>
ggplot(mapping = aes(x = Base_Salary, y = Total_Compensation)) +
  geom_point(aes(color = "red")) +
  scale_x_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
  scale_y_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
    labs(x = "Base Salary", y = "Total Compensation", title = "Base Salary vs. Total Compensatory for All Those with Overtime or longevity pay") +
  geom_smooth(method = "lm") +
  facet_wrap(~Sex)
p4

The slope coefficient (intercept and then slope) for males is -3.2236932^{4}, 1.5862565 and for females it is -1.7913256^{4}, 1.3598861.

Gender Plot 5
p5 <- Employees_With_Add_Comp |>
ggplot(mapping = aes(x = Base_Salary, y = Total_Compensation)) +
  geom_point(aes(color = "red")) +
  scale_x_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
  scale_y_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
    labs(x = "Base Salary", y = "Total Compensation", title = "Base Salary vs. Total Compensatory for All those with additional compensation") +
  geom_smooth(method = "lm") +
  facet_wrap(~Sex)
p5

The slope coefficient (intercept and then slope) for males is -5492.0835418, 1.2586281 and for females it is -261.9011481, 1.0971166.


It looks like males are consistently getting higher slopes than their counterparts.

p6 <- df |>
  group_by(Employee) |>
ggplot(mapping = aes(x = Percent_Base, y = Total_Compensation)) +
  geom_point(aes(color = "red")) +
  scale_y_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
    labs(x = "Percent Base", y = "Total Compensation", title = "Base Salary vs. Total Compensatory for All those with additional compensation") +
  geom_smooth(method = "lm")
p6
`geom_smooth()` using formula = 'y ~ x'

Topic 3 Management

The data provided was not too clear on the types of jobs each employee had. However, I was able to pick out those jobs dealing with management. Do these jobs follow the trends we saw earlier? Both in terms of additional compensation scaling and men benefiting from this more? Due to limitations in the data (the spread of the managerial data does not allow for proper slope analysis) we will be relying on visuals and summaries.

Management was broken down into categories: where M4-Senior Manager (Not included in this particular data)

M3-Middle Manager M2-Middle Manager M1-First Level Manager

Management Data Data-Tidying

Management Data Tidying
#filter
Management_1 <- filter(df, Grade == "M1")
Management_2 <- filter(df, Grade == "M2")
Management_3 <- filter(df, Grade == "M3")

#combine 
Management_df <- bind_rows(Management_1, Management_2, Management_3)

Is Management Data similar to The original? Are Male and Female equally represented?

Management Data Comparability
# all
Row_Management_df <- Management_df |> nrow()

# M F distinction
Row_M_Management_df <- Management_df |> filter(Sex =="M") |> nrow()
Row_F_Management_df <- Management_df |> filter(Sex =="F") |> nrow() 

# Male Managerial distinction
Row_M_M1_Management_df <-  Management_df |> filter(Grade == "M1") |> filter(Sex =="M") |> nrow()
Row_M_M2_Management_df <- Management_df |> filter(Grade == "M2") |> filter(Sex =="M") |> nrow()
Row_M_M3_Management_df <- Management_df |> filter(Grade == "M2") |> filter(Sex =="M") |> nrow()

# Female Managerial distinction
Row_F_M1_Management_df <- Management_df |> filter(Grade == "M1") |> filter(Sex =="F") |> nrow()
Row_F_M2_Management_df <- Management_df |> filter(Grade == "M2") |> filter(Sex =="F") |> nrow()
Row_F_M3_Management_df <- Management_df |> filter(Grade == "M2") |> filter(Sex =="F") |> nrow()

There are 446 employees in management. 214 of those are male and 232 of those are female.

Between the management levels, males and females seem to be equally represented.

In M1, there are 17 males and 11 females.

In M2, there are 73 males and 74 females.

In M3, there are 73 males and 74 females.

Management Plots

Management Plots
p <- Management_df |>
ggplot(mapping = aes(x = Base_Salary, y = Total_Compensation)) +
  geom_point(aes(color = "red")) +
  scale_x_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
  scale_y_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
    labs(x = "Base Salary", y = "Total Compensation", title = "Base Salary vs. Total Compensatory for All Employees") +
  geom_smooth(method = "lm") +
  facet_wrap(~ Grade)
p

The slopes look pretty similar…

Management Plots
p <- Management_df |>
ggplot(mapping = aes(x = Base_Salary, y = Total_Compensation)) +
  geom_point(aes(color = "red")) +
  scale_x_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
  scale_y_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
    labs(x = "Base Salary", y = "Total Compensation", title = "Base Salary vs. Total Compensatory for All Employees") +
  geom_smooth(method = "lm") +
  facet_wrap(~ Sex)
p

The slopes look pretty even across the sexes too, what if we combined them?

Management Plots
p2 <- Management_df |>
ggplot(mapping = aes(x = Base_Salary, y = Total_Compensation)) +
  geom_point(aes(color = "red")) +
  scale_x_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
  scale_y_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
    labs(x = "Base Salary", y = "Total Compensation", title = "Base Salary vs. Total Compensatory for All Employees") +
  geom_smooth(method = "lm") +
  facet_grid(rows = vars(Grade), cols = vars(Sex))
p2

Yeah, the slopes look pretty similar across the charts! Remember, there is not a lot of management specific data, so some lines might appear to have more or less extension than they otherwise would.

Management Summary

Let’s look at the the management data to see some specific numbers that wouldn’t show up in a graph

Comparability

First, let’s see how employees with management jobs compare to employees at large — to see if we picked an okay sample.

Mean Base Salary Management
Management_df |>
  group_by(Sex) |>
summarize(mean = mean(Base_Salary)) |> gt()|>
  opt_row_striping() |> 
  opt_stylize(style = 4, color = "red")
Sex mean
F 150932.1
M 155994.6
Mean Base Salary All
df |>
  group_by(Sex) |>
summarize(mean = mean(Base_Salary)) |> gt()|>
  opt_row_striping() |> 
  opt_stylize(style = 4, color = "red")
Sex mean
F 87497.50
M 92382.93

Well, to maybe not too much suprise, the mean base salaries for management are much higher than average.

Mean Total Compensation Management
Management_df |>
  group_by(Sex) |>
summarize(mean = mean(Total_Compensation)) |> gt()|>
  opt_row_striping() |> 
  opt_stylize(style = 4, color = "red")
Sex mean
F 150968.2
M 157011.8
Mean Total Compensation All
df |>
  group_by(Sex) |>
summarize(mean = mean(Total_Compensation)) |> gt()|>
  opt_row_striping() |> 
  opt_stylize(style = 4, color = "red")
Sex mean
F 91559.04
M 106083.94

Interestingly too, the mean total compensation salaries for management are much higher than average. Also, it is important to note here that there is little additional compensation above base salary for management, so perhaps this wasn’t the most ideal choice!

Comparability: Sex
Mean Total Compensation Management Male
head(Management_df |>
filter(Sex == "M") |> 
summarize(Total_Compensation) |> 
arrange(desc(Total_Compensation)), 100) |> gt()|>
  opt_row_striping() |> 
  opt_stylize(style = 4, color = "red")
Employee Total_Compensation
1323 256993.0
591 196059.0
1679 196059.0
1681 196059.0
2060 196059.0
3424 196059.0
7555 196059.0
7569 196059.0
10216 196059.0
10257 196059.0
7799 196040.7
7117 195794.5
501 193460.3
5669 193460.3
2923 191726.0
2017 190784.7
2924 190000.0
3252 183625.7
469 176948.0
1 175873.0
515 175873.0
529 175873.0
572 175873.0
578 175873.0
846 175873.0
1448 175873.0
1458 175873.0
1517 175873.0
1587 175873.0
1618 175873.0
1694 175873.0
1837 175873.0
2053 175873.0
2061 175873.0
2118 175873.0
2296 175873.0
2363 175873.0
3246 175873.0
3445 175873.0
3505 175873.0
3585 175873.0
3587 175873.0
3590 175873.0
3600 175873.0
3632 175873.0
3877 175873.0
5309 175873.0
5668 175873.0
5735 175873.0
7573 175873.0
7760 175873.0
7776 175873.0
7856 175873.0
7859 175873.0
9684 175873.0
10155 175873.0
10215 175873.0
10224 175873.0
10268 175872.4
364 175861.2
3568 175409.9
6413 175273.1
7594 175099.4
844 174796.6
1688 174396.4
1682 174093.2
7802 172457.9
7814 170084.7
3608 169873.5
7777 168747.6
5147 168258.1
2062 168234.0
6277 167350.4
9030 165543.8
6165 158075.4
437 157272.3
919 155976.0
3910 154769.8
1360 154508.5
2348 154277.8
7870 154201.7
7682 153930.2
239 152940.0
1412 152940.0
1520 152940.0
1593 152940.0
1612 152940.0
1619 152940.0
1635 152940.0
1764 152940.0
1798 152940.0
2021 152940.0
2069 152940.0
2077 152940.0
2298 152940.0
2318 152940.0
2323 152940.0
2371 152940.0
3247 152940.0
3266 152940.0
Mean Total Compensation Management Female
head(Management_df |>
filter(Sex == "F") |> 
summarize(Total_Compensation) |> 
arrange(desc(Total_Compensation)), 100) |> gt()|>
  opt_row_striping() |> 
  opt_stylize(style = 4, color = "red")
Employee Total_Compensation
859 196059.0
1389 196059.0
3584 196059.0
7122 196059.0
7818 196059.0
7558 194520.4
5680 188679.9
7752 185609.7
722 180704.1
3599 179833.0
606 179259.9
536 178510.6
521 177064.1
537 175873.0
555 175873.0
584 175873.0
593 175873.0
1308 175873.0
1557 175873.0
1669 175873.0
1680 175873.0
1996 175873.0
2067 175873.0
2775 175873.0
3458 175873.0
3586 175873.0
3595 175873.0
3598 175873.0
3648 175873.0
5156 175873.0
5158 175873.0
5315 175873.0
5488 175873.0
5788 175873.0
6065 175873.0
6086 175873.0
7036 175873.0
7687 175873.0
7779 175873.0
8864 175873.0
9861 175873.0
10129 175873.0
10162 175873.0
10206 175873.0
10174 175867.1
10159 175259.7
604 174876.5
6275 174165.9
6308 174165.9
834 174000.0
3421 172415.2
608 172026.6
7797 171540.0
10212 171540.0
9712 171520.4
9859 170930.7
6064 170203.8
824 169159.0
8977 169114.8
7745 168128.4
6066 167791.7
7815 167314.0
5802 167227.2
115 166140.0
9721 165998.0
5164 164333.0
8697 158007.4
5163 157550.0
5507 156050.5
7559 155747.5
1670 153000.7
465 152940.0
471 152940.0
504 152940.0
579 152940.0
1306 152940.0
1433 152940.0
1477 152940.0
1547 152940.0
1566 152940.0
1609 152940.0
1961 152940.0
2008 152940.0
2036 152940.0
2066 152940.0
2364 152940.0
3387 152940.0
3577 152940.0
3589 152940.0
3645 152940.0
3893 152940.0
4066 152940.0
5761 152940.0
6483 152940.0
6498 152940.0
6574 152940.0
6835 152940.0
7664 152940.0
7698 152940.0
8973 152940.0

Sorry for the scrolling, but in terms of the “glass-ceiling” — it seems like the top earners are males, with female compensation dropping off more quickly than male compensation

Topic 4: Department

Was our data consistent among all departments? Are one or a few departments possible skewing the data (and the above discussed scaling?)

Department Plots

Focus: Sex

Management Summary
p2 <- df |>
ggplot(mapping = aes(x = Department, y = Total_Compensation)) +
  geom_point(aes(color = Sex, shape = Sex, size = Base_Salary)) +
  scale_y_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
    labs(x = "Base Salary", y = "Total Compensation", title = "Base Salary vs. Total Compensatory for All Employees") 
p2

It seems like M and F are pretty okay in terms of the spread. FRS and POL looks male dominated and HHS looksfemale dominated.

Focus: Percent Base Salary & Additional Compensation

Base Salary
Management Summary
p2 <- df |>
ggplot(mapping = aes(x = Department, y = Total_Compensation)) +
  geom_point(aes(color = Percent_Base, size = Sex)) +
  scale_y_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
    labs(x = "Base Salary", y = "Total Compensation", title = "Base Salary vs. Total Compensatory for All Employees") 
p2

COR, FRS, and POL stick out as having higher total compensation, and this compensation (shown in dark blue) becomes higher at the point where it no longer is because of their base salary as much.

Additional Compensation
Management Summary
p2 <- df |>
ggplot(mapping = aes(x = Department, y = Total_Compensation)) +
  geom_point(aes(color = Percent_Add_Comp, size = Sex)) +
  scale_y_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
    labs(x = "Base Salary", y = "Total Compensation", title = "Base Salary vs. Total Compensatory for All Employees") 
p2

That is, same as above, you can see COR, FRS, and POL start to have additional compensation right where they start to get above the rest.

Focus: Combine them! (Gender & Percent Base Salary)

Management Summary
p2 <- df |>
ggplot(mapping = aes(x = Department, y = Total_Compensation)) +
  geom_point(aes(color = Percent_Base, shape = Sex, size = Base_Salary)) +
  scale_y_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
    labs(x = "Base Salary", y = "Total Compensation", title = "Base Salary vs. Total Compensatory for All Employees") 
p2

Here’s an intersting visual showing the same information as the two previous — but here again we still se COR, FRS, and POL sticking out.

Those three lines with lower percent bases and higher total comp are COR FRS and POL.

Management Summary
#   Correction and Rehabilitation
head(filter(df, Department == "COR"), 1) |>
  select(1:2) |>gt()|>
  opt_row_striping() |> 
  opt_stylize(style = 4, color = "red")
Department Department_Name
862
COR Correction and Rehabilitation
Management Summary
head(filter(df, Department == "FRS"), 1) |>
  select(1:2)|> gt()|>
  opt_row_striping() |> 
  opt_stylize(style = 4, color = "red")
Department Department_Name
3690
FRS Fire and Rescue Services
Management Summary
head(filter(df, Department == "POL"), 1) |>
  select(1:2) |> gt()|>
  opt_row_striping() |> 
  opt_stylize(style = 4, color = "red")
Department Department_Name
7918
POL Department of Police

Based on what the three departments are, as shown in the tables above, do you have any idea why they would stick out? (I don’t have an hypothesis, personally)

Conclusion

We saw evidence that there is a lack of diminishing returns for getting additional compensation

Moreover, males tend to experience less diminishing returns than their counterparts.

The gender of participants seems to be pretty consistent across all departments, and three departments stick out for how they compensate their employees.

Management is not too much like the entire data in terms of mean base salary and mean total compensation, but the slopes of the lines do not look too radical